Adaptive Exploration Using Stochastic Neurons

نویسندگان

  • Michel Tokic
  • Günther Palm
چکیده

Stochastic neurons are deployed for efficient adaptation of exploration parameters by gradient-following algorithms. The approach is evaluated in model-free temporal-difference learning using discrete actions. The advantage is in particular memory efficiency, because memorizing exploratory data is only required for starting states. Hence, if a learning problem consist of only one starting state, exploratory data can be considered as being global. Results suggest that the presented approach can be efficiently combined with standard offand on-policy algorithms such as Q-learning and Sarsa.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Market Adaptive Control Function Optimization in Continuous Cover Forest Management

Economically optimal management of a continuous cover forest is considered here. Initially, there is a large number of trees of different sizes and the forest may contain several species. We want to optimize the harvest decisions over time, using continuous cover forestry, which is denoted by CCF. We maximize our objective function, the expected present value, with consideration of stochastic p...

متن کامل

Adaptive Fractional-order Control for Synchronization of Two Coupled Neurons in the External Electrical Stimulation

This paper addresses synchronizing two coupled chaotic FitzHugh–Nagumo (FHN) neurons with weakly gap junction under external electrical stimulation (EES). To transmit information among coupled neurons, by generalization of the integer-order FHN equations of the coupled system into the fractional-order in frequency domain using Crone approach, the behavior of each coupled neuron relies on its pa...

متن کامل

Target Detection in Bistatic Passive Radars by Using Adaptive Processing Based on Correntropy Cost Function

In this paper a novel method is introduced for target detection in bistatic passive radars which uses the concept of correntropy to distinguish correct targets from false detections. In proposed method the history of each cell of ambiguity function is modeled as a stochastic process. Then the stochastic processes consist the noise are differentiated from those consisting targets by constructing...

متن کامل

Information Compexity in Bandit Subset Selection

We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset of a specified size. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012